natural test accuracy
NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels
Zhang, Jingfeng, Xu, Xilie, Han, Bo, Liu, Tongliang, Niu, Gang, Cui, Lizhen, Sugiyama, Masashi
Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhibit the robustness at odds with accuracy and the existence of the cross-over mixture problem, which motivates us to study some label randomness for benefiting the AT. First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT. Second, based on the observations, we propose a simple but effective method -- NoiLIn that randomly injects NLs into training data at each training epoch and dynamically increases the NL injection rate once robust overfitting occurs. Empirically, NoiLIn can significantly mitigate the AT's undesirable issue of robust overfitting and even further improve the generalization of the state-of-the-art AT methods. Philosophically, NoiLIn sheds light on a new perspective of learning with NLs: NLs should not always be deemed detrimental, and even in the absence of NLs in the training set, we may consider injecting them deliberately. Codes are available in https://github.com/zjfheart/NoiLIn.
Fooling Adversarial Training with Inducing Noise
Wang, Zhirui, Wang, Yifei, Wang, Yisen
Adversarial training is widely believed to be a reliable approach to improve model robustness against adversarial attack. However, in this paper, we show that when trained on one type of poisoned data, adversarial training can also be fooled to have catastrophic behavior, e.g., 1% robust test accuracy with 90% robust training accuracy on CIFAR-10 dataset. Previously, there are other types of noise poisoned in the training data that have successfully fooled standard training (15.8% standard test accuracy with 99.9% standard training accuracy on CIFAR-10 dataset), but their poisonings can be easily removed when adopting adversarial training. Therefore, we aim to design a new type of inducing noise, named ADVIN, which is an irremovable poisoning of training data. ADVIN can not only degrade the robustness of adversarial training by a large margin, for example, from 51.7% to 0.57% on CIFAR-10 dataset, but also be effective for fooling standard training (13.1% standard test accuracy with 100% standard training accuracy). Additionally, ADVIN can be applied to preventing personal data (like selfies) from being exploited without authorization under whether standard or adversarial training. In recent years, deep learning has achieved great success, while the existence of adversarial examples (Szegedy et al., 2014) alerts us that existing deep neural networks are very vulnerable to adversarial attack. Adversarial Training (AT) is currently the most effective approach against adversarial examples (Madry et al., 2017; Athalye et al., 2018). In practice, adversarially trained models have been shown good robustness under various attack, and the recent state-of-the-art defense algorithms (Zhang et al., 2019; Wang et al., 2020) are all variants of adversarial training. Therefore, it is widely believed that we have already found the cure to adversarial attack, i.e., adversarial training, based on which we can build trustworthy models to a certain degree.